#install.packages("plotly", repos = "http://cran.us.r-project.org")
library(ggplot2)
library(plotly)
library(data.table)
library(RColorBrewer)

MyAnimeList Exploratory Data Analysis

This project is an exploration of the MyAnimeList dataset provided on Kaggle.com.

Importing the data

To start things off, We first import the data into R and observe its’ structure:

anime_data <- fread("dataanime.csv")
str(anime_data)
## Classes 'data.table' and 'data.frame':   1563 obs. of  20 variables:
##  $ Title          : chr  "Fullmetal Alchemist: Brotherhood" "Kimi no Na wa." "Gintama°" "Steins;Gate 0" ...
##  $ Type           : chr  "TV" "Movie" "TV" "TV" ...
##  $ Episodes       : chr  "64" "1" "51" "23" ...
##  $ Status         : chr  "Finished Airing" "Finished Airing" "Finished Airing" "Currently Airing" ...
##  $ Start airing   : chr  "2009-4-5" "2016-8-26" "2015-4-8" "2018-4-12" ...
##  $ End airing     : chr  "2010-7-4" "-" "2016-3-30" "-" ...
##  $ Starting season: chr  "Spring" "-" "Spring" "Spring" ...
##  $ Broadcast time : chr  "Sundays at 17:00 (JST)" "-" "Wednesdays at 18:00 (JST)" "Thursdays at 01:35 (JST)" ...
##  $ Producers      : chr  "Aniplex,Square Enix,Mainichi Broadcasting System,Studio Moriken" "Kadokawa Shoten,Toho,Sound Team Don Juan,Lawson HMV Entertainment,Amuse,East Japan Marketing & Communications" "TV Tokyo,Aniplex,Dentsu" "Nitroplus" ...
##  $ Licensors      : chr  "Funimation,Aniplex of America" "Funimation,NYAV Post" "Funimation,Crunchyroll" "Funimation" ...
##  $ Studios        : chr  "Bones" "CoMix Wave Films" "Bandai Namco Pictures" "White Fox" ...
##  $ Sources        : chr  "Manga" "Original" "Manga" "Visual novel" ...
##  $ Genres         : chr  "Action,Military,Adventure,Comedy,Drama,Magic,Fantasy,Shounen" "Supernatural,Drama,Romance,School" "Action,Comedy,Historical,Parody,Samurai,Sci-Fi,Shounen" "Sci-Fi,Thriller" ...
##  $ Duration       : chr  "24 min. per ep." "1 hr. 46 min." "24 min. per ep." "23 min. per ep." ...
##  $ Rating         : chr  "R" "PG-13" "R" "PG-13" ...
##  $ Score          : num  9.25 9.19 9.16 9.16 9.14 9.11 9.11 9.11 9.1 9.07 ...
##  $ Scored by      : int  719706 454969 70279 12609 552791 28452 90758 395162 26284 62582 ...
##  $ Members        : int  1176368 705186 194359 186331 990419 121772 212238 705225 80166 121612 ...
##  $ Favorites      : int  105387 33936 5597 1117 90365 8370 4533 63324 1961 1498 ...
##  $ Description    : chr  "\"\"In order for something to be obtained, something of equal value must be lost.\"\"\r\n\r\nAlchemy is bound b"| __truncated__ "Mitsuha Miyamizu, a high school girl, yearns to live the life of a boy in the bustling city of Tokyoâ\200”a dre"| __truncated__ "Gintoki, Shinpachi, and Kagura return as the fun-loving but broke members of the Yorozuya team! Living in an al"| __truncated__ "The dark untold story of Steins;Gate that leads with the eccentric mad scientist Okabe, struggling to recover f"| __truncated__ ...
##  - attr(*, ".internal.selfref")=<externalptr>

We can see that we have a dataframe with more than 1500 anime. Each of them has information about their Title, Score, Sources, Broadcast time etc.

Data Cleaning

At first, we change the categorical columns into factors:

Next, we convert the date columns into the proper date format in R:

anime_data$`Start airing` <-
    as.Date(anime_data$`Start airing`, "%Y-%m-%d")
anime_data$`End airing` <-
    as.Date(anime_data$`End airing`, "%Y-%m-%d")

Now we will see what percentage of our data is missing(NA) in a plot:

Exploratory Data Analysis

In this part we ask a series of important questions about the data and try to answer them with a graphical representation of our data:

Which type of anime do we have more of?

I wanted to have a pie chart here but was forced to use a barplot because there was a problem with coord_polar(‘y’) for some reason apparently, it is currently an open issue in the ggplot2 package

Which years were the best for starting an anime series?

Which years were the best for ending an anime series?

Which year was the best for releasing an anime movie?

Which season is most filled with anime?

Which season is best for starting an anime series?

Which day of the week is the most anime heavy?

Which day of the week is better for anime broadcast time?

Which time slot has the most amount of anime?

Which time slot is the best for anime?

What percentage of anime come from a manga?

How does the source affect an the popularity of an anime?

How does the rating effect popularity?

Modelling